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Abstract 

In genetic association analyses, it is often desired to analyze data from multiple potentially- 
heterogeneous subgroups. The amount of expected heterogeneity can vary from modest (as might 
typically be expected in a meta-analysis of multiple studies of the same phenotype, for example), to 
large (e.g. a strong gene-environment interaction, where the environmental exposure defines discrete 
subgroups). Here, we consider a flexible set of Bayesian models and priors that can capture these 
different levels of heterogeneity. We provide accurate numerical approaches to compute approximate 
Bayes Factors for these different models, and also some simple analytic forms which have natural 
interpretations and, in some cases, close connections with standard frcqucntist test statistics. These 
approximations also have the convenient feature that they require only summary-level data from each 
subgroup (in the simplest case, a point estimate for the genetic effect, and its standard error, from 
each subgroup). We illustrate the flexibility of these approaches on three examples: an analysis of 
a potential gene-environment interaction for a recombination phenotype, a large scale meta-analysis 
of genome-wide association data from the Global Lipids consortium, and a cross-population analysis 
for expression quantitative trait loci (eQTLs). 
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1 Introduction 



In this paper, we develop Bayesian methods for analyzing genetic association data, allowing for poten- 
tial heterogeneity among (pre-specified) subgroups. We are motivated by two distinct settings where 
heterogeneity may arise. The first setting is meta-analysis of multiple association studies of the same 
phenotype. These studies are usually carried out by different investigators, at different centers, and so 
heterogeneity (or apparent heterogeneity) of genetic effects might be expected (e.g. due to differences 
in the way phenotypes are measured, or due to systematic differences between individuals enrolled in 
each study). Such meta-analyses have become an increasingly popular and important statistical tool 
for detecting modest genetic associations that are too small to be detected in smaller individual studies 



(Teslovich et al. (2010), Zeggini et al. (2008)), and frequentist methods focused on allowing for het- 



erogeneity in this setting were recently proposed by Lebrec et al. (2010) and Han and Eskin (2011). 



The second setting is where genuine biological interactions may cause some genetic variants to exhibit 
different effects on individuals in different subgroups; for example, genetic effects can differ in males and 



females even at autosomal loci (Kong et al. (2008), Ober et al. (2008)). And in gene expression analyses 
that aim to detect genetic variants associated with gene expression levels, data are often available on 



individuals from different continental groups (Stranger et al. (2007), Veyrieras et al. (2008)), or on 



different tissue types (Dimas et al. (2009)), where heterogeneity of effects may be expected. 



These two settings differ in the extent of the heterogeneity expected: for example, interactions could 
cause genetic variants to have effects in different directions in different subgroups, whereas this might be 
considered unlikely in the meta-analysis setting. They also differ in the extent to which heterogeneity 
may be of direct interest (e.g. in interactions) or largely a "nuisance" (e.g. in meta-analysis). However, 
the two settings also share an important element in common: the vast majority of genetic variants 
are unassociated with any given phenotype of interest, within all subgroups. Consequently, it is of 
considerable interest to identify genetic variants that show association in any subgroup, or in other 
words to reject the "global" null hypothesis of no association within any subgroup. This focus on 
rejecting the global null hypothesis distinguishes genetic association analyses from other settings, and 



calls for analysis approaches tailored to this goal; see Lebrec et al. (2010) for relevant discussion 
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The methods we consider here are suited to the analysis of genetic association studies, and are sufficiently 
flexible to handle a range of settings in which heterogeneity may be an issue, from meta-analyses to gene- 
environment interactions. In brief, we introduce families of alternative models that allow for a range 
of different effect sizes and levels of heterogeneity, and then address questions of interest by comparing 
the support in the data for these different models vs the global null. Within this framework the goal 
of testing the global null is accomplished by assessing the overall support for any of the alternative 
hypotheses, whereas the goal of examining heterogeneity among groups is achieved by comparing the 
relative support for different alternative models. 



Although there has been previous work on Bayesian methods for meta-analysis (e.g. Sutton and Abrams 



(2001), Stangl and Berry (2000), DuMouchel and Harris (1983), Whitehead and Whitehead (1991), Li 



(1994 


), 


Eddy et al. 


(1990 





Burgess et al. (2010), Mila and Ngugi (2011)), our focus here is somewhat different to much of this. 
First, due to the nature of genetic association studies mentioned above, our focus is particularly on 
testing and model comparison, via Bayes Factors, rather than on estimation. Second, due to the fact 
that in genetic studies one often wishes to perform millions of analyses, we focus on obtaining fast 
numerical approximations to Bayes Factors. Finally, we provide novel results that connect the Bayesian 
methods with standard frequentist test statistics used in many meta-analyses. Specifically we show 
how for certain prior assumptions the Bayesian analyses depend on the phenotype data, asymptotically, 
only through these test statistics, and that the common practice of ranking SNP associations by these 
standard test statistics is equivalent to making specific (and not necessarily realistic) prior assumptions. 



2 Models and Methods 

We start with specifying models for quantitative traits, derive efficient methods for computing (approx- 
imate) Bayes Factors comparing these models, and discuss their statistical properties and connections 
with frequentist test statistics. Later, we generalize these results to binary outcomes in case-control 
studies. 
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2.1 Notation and Assumptions 



Assume (quantitative) phenotype data and genotype data are available on S pre-defined subgroups, 
and focus on assessing association between the phenotype and each genetic variant, one at a time. For 
convenience we assume each genetic variant is a Single Nucleotide Polymorphism (SNP), although our 
methods could easily accommodate other types of variant. We assume that the data within subgroup 
s come from randomly-sampled unrelated individuals. Let the ris-vectors and Qg denote, respec- 
tively, the corresponding phenotype data and the genotype data at a single "target" SNP. We also 
let Y = (pi, . . . ,yg) and G = {qi, . . . ,gg) denote the complete set of phenotype and genotype data 
respectively. 

2.2 Hierarchical Models for Quantitative Traits 

In this section, we introduce a set of models for describing heterogeneous genetic effects for a target 
SNP across subgroups. 

Within each subgroup, we model the association between phenotype and genotype using a standard 
linear model. Without loss of generality, in subgroup s, we assume 

y, = + I3s9s + , e, ~ N(0, a^J). (2.1) 

Here, we also assume residual errors are independent across subgroups. 

The "global" null hypothesis of interest is that there is no genotype-phenotype association within any 
subgroup; that is, /3s = for all s. 

Under the alternative hypothesis we begin by assuming that the genetic effects among subgroups are 
exchangeable, and more specifically that they are normally distributed about some unknown common 
mean. We consider two different definitions of genetic effects: the "standardized effects" bg := Ps/os, 
and the unstandardized effects, (3s, leading to the following models: 

1. Exchangeable Standardized Effects (ES model). Under this model we assume that the standardized 
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effects bs are normally distributed among subgroups: 

bs\cTs ^ N{b,(l)^) or equivalently, (3s\(Ts ^ N{asb,a^J^), (2.2) 

so the hyper-parameters b and cp characterize, respectively, the mean and variance of effects among 
subgroups. We also assume a normal prior distribution for b, 

6~N(0,w2). (2.3) 

Finally, for the parameters ag and fi^, which are common to both the null and alternative hy- 
potheses, we use the convenient conjugate priors 

/x,|(7,~N(0,a,V); a;2^r(m,/2,Z,/2). (2.4) 

When performing inference we consider the posterior distributions that arise in the limits — > oo 
and ls,ms — )• (which correspond to standard improper priors for a normal mean and variance, 
and result in proper posteriors). 

2. Exchangeable Effects (EE model). Under this model we assume that the unstandardized effects 
Ps are normally distributed: 

Ps^^iP,^^^), (2.5) 

where P and ^ play similar roles to b and cf) in the ES model. We also assume a normal prior for 

P, 

/3~N(0,u;2), (2.6) 
and priors for {fj,s,(^s)- 

/x,~N(0,tx^); a-^ ^r{ms/2,ls/2). (2.7) 
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For each subgroup this prior specification is effectively the semi-conjugate prior commonly used 
in Bayesian linear regression. As above, we consider the limits — )• oo and Ig, ms 0. 

In both the ES and EE models the alternative hypothesis involves two key hyper-parameters, one {u 
in the ES model and w in the EE model) that controls the prior expected size of the average effect 
across subgroups, and another (0 in the ES model and il^ in the EE model) that controls the prior 
expected degree of heterogeneity among subgroups. A complimentary view is that -|- cp'^ (respectively, 
w'^ + ip'^) controls the expected (marginal) effect size in each study and (p/io (respectively, tp/w) controls 
the degree of heterogeneity. 

Of the two models, the ES model has the advantage that it results in analyses (e.g. Bayes Factors) that 
are invariant to the phenotype measurement scale used within each subgroup. This not only makes it 
more robust to users accidentally specifying phenotype measurements in different subgroups on different 
scales (possibly a non-trivial issue in complex analyses involving collaboration among many research 
groups), but also means that it can be applied when measurement scales may be difficult to harmonize 
across subgroups, for example due to the use of different measurement technologies. For these reasons 
we prefer the ES model for general use. However, in some cases the EE model may be easier to apply. 
For example, if one has access only to published point estimates and standard errors for the effect size 
in each study, then this suffices to approximate the Bayes Factor under the EE model, but not under 
the ES model. Note that the ES and EE models will produce similar results to each other if the residual 
error variances are similar in all subgroups. 

It is also possible to control for additional (possibly study-specific) covariates by including additional 



predictor variables into (2.1). If independent fiat priors are used for the coefficients of these controlled 
covariates within each study then our main results below still hold, effectively unchanged. This treat- 
ment is analogous to the frequentist mixed effects model, where controlled covariates are typically 
assumed to have study-specific effects. 
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2.2.1 A Curved Exponential Family Normal Prior 



In the priors described above the variance of the effect sizes {00"^) is independent of the average effect 
(6). In some settings this independence assumption may seem unattractive. For example, in meta- 
analysis, we may expect genuine genetic association to possess the property that effect sizes across 



studies typically have the same sign (Owen (2009)), regardless whether b is small or large. But the 



independence assumption implies that the probability that the effects have the same sign is much larger 
when b is large than when it is small. To address this we can modify the priors above to allow the 



variance to depend on the mean. For example, under the ES model, we can replace (2.2) with 



N(6,r6^), (2. 



so that 



Pr (hg has a different sign from 6) = $(———), (2-9) 



where <I> is the cumulative probability function of standard normal distribution. Note that (2.9) does 
not depend on b. For example, when k = 1/2, sampling from this prior distribution, the probability of 
obtaining a value of bs having an opposite sign to b is approximately 2.3%. As the value of k decreases, 
the restriction becomes more stringent. When k = 0, the prior indicates all bg are exactly same as 6, 
i.e. there is no heterogeneity of effects across subgroups. 



Similarly, for the EE model, we can replace (2.5) with 



/3, ~N(/3,A:"/3^). (2.10) 

We refer to these alternative priors as "Curved Exponential Family Normal" (CEFN) priors, because 
they involve a functional relationship between the mean and variance. 
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2.3 Bayes Factors for Testing the Global Null Hypothesis 

We now consider computing Bayes Factors for testing the "global" null hypothesis that the phenotype 
is not associated with the target SNP in any of the subgroups, versus the alternative hypotheses out- 
lined above. We focus primarily on the simplest ES model, and give details for the EE model, and 
modifications for the CEFN models, in appendices. 

Recall that the ES model is indexed by two parameters, ^ and oj. Within this model, the global null 
hypothesis, which is most naturally written as = for all s, can also be written as 



HQ:<j) = u = Q. (2.11) 

To compare the support in the data for this null hypothesis with the support for a particular alternative 
ES model specified by parameters {4>,u!), we use the Bayes Factor: 

This Bayes Factor also depends on the hyper-parameters Vs,ls and m^; however, because these hyper- 
parameters are common to both the null and alternative hypotheses, the value of the Bayes Factor is 
not especially sensitive to the values chosen. As noted above we take the limits 

v^^ ^ DO, 0, nis 0, Vs. (2.13) 



Each value of (w, (/)) corresponds to a particular alternative model, with u controlling the typical average 
effect size, and (f> controlling the degree of heterogeneity among subgroups (or in a re-paramctrization, 
+ (fP' controls the expected marginal effect size in each subgroup and (f)/u! controls the degree of 
heterogeneity). There may be reasonable uncertainty about appropriate values for (f) and co due to 
the unknown mechanisms that cause heterogeneity. One simple way to allow for this is to specify a 
(discrete) prior distribution on a set of plausible values {(i;/)^*), w^*)) : i = 1, . . . , M}. We give a specific 
choice of such prior in the applications below. If tTj denotes the prior weight on (^^^^w^*-*) then the 
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resulting Bayes Factor against Hq is the weighted average of the individual BFs: 

M 

BF^^ := ^^iBF^S(0«,L^«). (2.14) 

i=l 

This average could also, of course, be extended to include averages over other models (e.g. one could 
average over some models that use the CEFN prior, and others that do not). The fact that Bayes Factors 
under different models for heterogeneity can be both averaged in this way (to assess evidence against 
the global null, allowing for heterogeneity), and compared with one another (to assess the evidence for 
different levels or types of heterogeneity), is one advantage of the Bayesian framework compared with 
typical frequentist treatments. 



2.3.1 Calculating the Bayes Factors 



Calculating BF^^((/),a;) boils down to evaluating a complicated multi-dimensional integral. In appendix 
[a] we present two different approximations, both based on applying Laplace's method and both having 
error terms that decay inversely with the average sample size across subgroups. The first of these. 



which effectively follows methods from Butler and Wood ( 2002 ) for computing confluent hyper-geometric 



functions, is very accurate, even for small sample sizes. Indeed, for the special case of a single subgroup 
(S = 1), the approximation becomes exact, and for small numbers of subgroups we have checked 
numerically (appendix [D]) that it provides almost identical results to an alternative approach based on 
adaptive quadrature (which is practical only for small S). However, it requires a numerical optimization 
step and has a somewhat complex form, which although not a practical barrier to its use does hinder 

-ES 

intuitive interpretation. In what follows we use BF to denote this approximation. 

The second approximation is less accurate for small samples sizes, but converges asymptotically (with 
average sample size) to the correct answer. For the special case of S" = 1 it yields an analogue of 



the approximate Bayes Factors from Wakefield (2009) and Johnson (2008), and in what follows we use 



ABF^^ to denote this approximation under the ES model. The nice feature of ABF^^ is that it has an 
intuitive analytic form with close connections to standard frequentist test statistics for meta-analysis. 
Proposition 1 below gives this analytic form in detail. 



9 



Before stating Proposition 1, we introduce some notation. 



Association Testing in a Single Subgroup 

First, consider analyzing a single subgroup, s. Let /3s and (T^ denote the least square estimates of 



and as from the linear regression model (2.1 ) using only data from s. Then an estimate for the 
standardized effect bg can be obtained from 

bs = ^s/^s- (2.15) 

Under the assumption of no association, the standard error of 6s, 6s '■= se{bs), is 

J2 = ^ (2.16) 

9's9s - ns9i 

A statistic Ts for testing 6^ = can be constructed as 

Ts = = (2.17) 

se(6s)2 f^l^l 

Note that Ts is also equal to f3s/se{f3s), which is the usual t-statistic for testing /3s = in a 
frequentist framework. 



Both Wakefield (2009) and Johnson (2008) derive an approximate Bayes Factor for testing bs 



N(0, (j)'^) vs. bs = 0, which has the form 



ABF» ,(r.,i.;*) := y^exp(^^jj^j . (2.18) 

As noted by Wakefield, if (j) is chosen differently for each SNP, and proportional to the value of 
5l for that SNP, then ABF^^gj^ will rank the SNPs in the same way as the usual test statistic 
Ts- This result connects the standard frequentist analysis to a particular (approximate) Bayesian 
analysis in the case of a single subgroup. Proposition 1 below effectively extends this to multiple 
subgroups, allowing for heterogeneity among subgroups. 

Testing Average Effect in a Random- effect Meta-analysis Model 
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Now consider the standard frequentist test of 6 = in a random-effect meta-analysis of all sub- 
groups, with bs ~ N(5, (j)'^). If (j) is considered known, the standard estimate for b and its standard 
error := se(6) are 



2 , .2^-l, 



E.(5? + 



(2.19) 



and 



-1 ■ 



(2.20) 



The usual frequentist statistic for testing 6 = is 



ES 



se(6)2 



(2.21) 



Applying Johnson's idea (Johnson ( 2005| 2008)) we can "translate" this test statistic into an 
approximate Bayes Factor for testing b ~ N(0,cj^) vs. 6 = 0, which yields 



exp 



^ES ^ 

2 e+u^ 



(2.22) 



With these ideas and notations in hand we can now describe the analytic form of the overall approximate 



Bayes Factor ABF^^(0,w), as a simple product of the ABFs (2.18) and (2.22) 



PROPOSITION 1. Under the ES model, applying a version of Laplace's method to approximate the 
Bayes Factor BF^^{(j), co) yields the approximation 



BF^^{cl>,u) « ABFES(^^^) ABFfl^,^{TisX;u^) ■llABFf^^,^iTl6,;<P), 



(2.23) 



an. 



d ABF^S^^^ 



UJ) converges 



(almost surely) to BF^^((/), w) as ris —)• oo for all subgroups s. 



Proof. See appendix |A.1[ □ 

NOTE 1. // the study-specific residual error variances, CTg are considered known (rather than being 
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assigned a prior distribution), and used in place of ag in the calculations o/ ABF^^, then the approxi- 
mation is exact, and ABF^^(0, w) = BF^^(0, w). This fact, together with the fact that the estimators 
as are consistent for as, provides an intuitive explanation for why the proposition holds. 

NOTE 2. The numerical accuracy o/ ABF^^ as an approximation to BF^^ depends on sample sizes, 
and for small sample sizes it may not be sufficiently accurate for routine application. However, a simple 
modification, described in appendix yields much greater accuracy. 

Proposition 1 breaks down the overall evidence for association into parts that are due to the evidence in 
each individual subgroup (the second term) and a part that reflects the consistency of the effects across 
subgroups (the first term). In particular, if all subgroups show effects in the same direction, then the 
first term will tend to be large (^ 1) and provide a "boost" in the evidence for association compared 
with the situations when the effects across subgroups are in different directions. 



A similar result holds for the EE model and is given in appendix |A.2 The detailed computation of Bayes 



Factors involving CEFN priors is shown in appendix A. 3 In appendix O we compare the numerical 



accuracies of the various Bayes Factor approximations described above. 



2.4 Properties of Bayes Factors 

In this section, we discuss some interesting and important properties of the Bayes Factors described 
above. 



2.4.1 Bayes Factors Depend Only on Summary Statistics 

^ ^ EE 

Both the true Bayes Factors (BF^^,BF™) and the approximations, (BF , BF , ABF^^, ABF^^) 
depend on the observed data in each subgroup only through a set of summary statistics, i.e., a 6-tuple 
(rig, Vps, I'Ss, y'sUs^ d'sQsi y'sQs)- Therefore, to use the proposed Bayesian methods it is not necessary 
to have access to the full data set from each individual subgroup. Further, for computing the ABFs 
the summary statistics needed from each subgroup are reduced to only [bs,se{hs)) for the ES model 
and (/3s, se(/3s)) for the EE model. These properties are potentially useful when collaborating among 
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groups, where sharing of raw data can be more difficult than sharing summary data. 



2.4.2 Induced Single Study Bayes Factors 

For the ES model, in the special case of one subgroup (5 = 1), both the actual Bayes Factor and our 



approximations to it (2.23) reduce to results from previous work. More specifically, the approximation 



— ^ES 

BF becomes exact in this case, and equal to the Bayes Factor derived by Servin and Stephens (|2007|), 



Wakefield 


(2009 


) (see also 


Johnson 


(2005 


) and 


Johnson 


(2008)) 



2.4.3 Non-informative Subgroup Data 

Suppose that in one of the subgroups, s, the sample genotypes concentrate in one of the three genotype 
categories. This situation might arise, for example, in cross-population genetic studies, where it is not 
uncommon to observe SNPs that are quite variable in some populations, but with little or no variation 
in other populations. Intuitively, in this case, subgroup s will contain little or no information for testing 
the global null hypothesis. Indeed, this will be correctly reflected in the standard error for the effect 
size, 6s, being large, and this will in turn result in study s not contributing to the Bayes Factor. For 



example, it is easy to see that, for large 6s (formally, in the limit 6s — ?• oo), the ABF (2.23) is unaffected 
by the association data in study s (Ts); and a similar result holds for the exact Bayes Factor (appendix 
[A| ) and for both EE and ES models. In other words, the Bayesian procedure correctly reflects the non- 
informativeness of the data from study s. Although this property seems very intuitive and one might 
expect every reasonable statistical procedure to possess it, many widely-used methods to combining 
results across studies do not have this property (e.g. Fisher's combined probability test). 



2.5 Extreme Models and Connections with Prequentist Tests 

The proposed models are very flexible, covering a wide range of types and degrees of heterogeneity by 
setting different values for (or k in the CEFN prior). In this section, we discuss the two extremes 

of no heterogeneity ("fixed effects") and maximum heterogeneity, and establish connections between 
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Bayesian and frequentist testing approaches in these settings. 



2.5.1 The Fixed Effects Model 

A fixed effects model assumes genetic effects are homogeneous across subgroups. In the ES model this 
corresponds to setting = 0, since this ensures bs = b for all s. Similarly, in the EE model setting 
'0 = ensures f3s = (3 for all s. 

Since fixed effects analyses are widely used, we now briefly divert to connect our notation and models 
to existing methods. In the cases = and tp = the test statistics 7es and Tee defined above have 
particularly simple forms, being a weighted sum of the individual Tg statistics from each study (often 
referred to as a weighted sum of Z scores when the sample sizes are large). In particular, they are of 
the form 

T= ^^^^^^ (2.24) 



where 

1. For the ES model, Ws = se(6s)~"'^ ~ \/'^nsfs{l — fs), 

2. For the EE model, Ws = se(4)-i « a^^ y^2nsfs{l - fs), 

and fs denotes the allele frequency of the target SNP in subgroup s (with the approximations coming 
from an assumption of Hardy- Weinberg equilibrium in each subgroup). We note in passing that these 
representations give some insight into the main practical difference between the ES and EE models for 
testing: the EE model gives greater weight to studies with small residual error variance. Note also that 
the Ts values are the same for both EE and ES models, and independent of measurement scale, but 
as depends on measurement scale, so Tes is robust to studies using different measurement scales (or 
different transformations of the phenotypes) but Tee is not. In addition, these representations clarify 
the connection between these statistics and the methods used in the meta-analysis software METAL 



(Wilier et al. (2010)). Specifically, METAL implements tests using the weighted statistic (2.24) with 
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two different weighting schemes, one corresponding to the EE model weights above, and the other with 
the weights equal to This latter scheme corresponds to the ES model only if fs is equal across 

studies. (Where fs varies across studies the weighting in the ES model seems, to us, preferable to the 
METAL scheme since, intuitively, studies with small /<j(l — fs) provide less information.) 

Returning now to the Bayes Factors, in the fixed effects cases, = 0, the approximate Bayes Factors 



(2.23) simplifies to 



where 



= (2.26) 



And a similar expression holds for ABFg^(tt;) := ABF^^('0 = 0,w). 
Implicit p-value prior 

We now answer the following question: under what prior assumptions will ABFg^ produce the same SNP 



rankings as the frequentist test statistic 7es? Wakefield (2009) names this kind of prior the "implicit 



p-value prior" , as it effectively identifies the implicit prior assumptions that are made by the standard 
practice of ranking SNPs by their p-value computed from 7es- 

Note that, although for a given SNP ABFff (w) is a monotone function of 7es , for a fixed value of a; the 
two statistics will not generally rank SNPs in the same way because ^ varies among SNPs. If however 
we allow uj to vary among SNPs in a particular way, then the two statistics will agree on their rankings, 
as summarized in the following proposition. 

PROPOSITION 2. (Implicit p-value prior, fixed effects) In the ES model, if the prior hyperpa- 
rameter uj is allowed to vary among SNPs, with 

iOp = KCp, (2.27) 
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where p indexes SNPs, and K is any positive constant, then ABF|^ and 7es will produce the same 
ranking of SNPs. 



Proof. This follows directly from substituting (2.27) into (2.25). □ 



Of course, a similar result holds for the EE model. 

NOTE 3. Recall that Qp is the standard error of b for SNP p, and so it is large if there is little 
information about b at a particular SNP. In a meta-analysis setting Qp can vary across SNPs not only 
because allele frequencies may vary across SNPs, but also because data on some SNPs may be available 
in only a subset of studies. Recall also that large values of oOp correspond to a prior assumption that the 
effect size b at SNP p is likely to be large (in absolute value). Thus the implicit p-value prior effectively 
assumes that there are larger effects at SNPs with less information. 

NOTE 4. In the idealized case where data on all SNPs are available on the same subgroups, and the 
subgroups also have similar allele frequencies at every SNP ( as might happen if the subgroups come from 
a single random mating population) then the sample genotype variance of SNP p in subgroup s can be 
well approximated by 2nsfp{l — fp), where fp is the population allele frequency of SNP p. (Note the 
slight abuse of notation, since we previously indexed f by subgroup, whereas here it is indexed by SNP.) 



Consequently, the implicit frequentist prior (2.21) can be written as 



which is effectively the same as the single subgroup case discussed by Wakefield (2009) 



2.5.2 Maximum Heterogeneity Model 

We now consider the other end of the spectrum: models with very high heterogeneity. Specifically, if 
we consider a class of ES models with some fixed prior expected marginal effect sizes {(j)'^ + w^), then 
the model with w = represents the maximum possible heterogeneity among all models in the class. In 
this case, the average effect b is forced to be strictly 0, and the effects bs\(j) are independent, A^(0, (p"^). 
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It can be shown from (A. 13) and (A.28) that for both the EE and the ES models, the exact Bayes 
Factor under this particular setting, BFmaxHi is the product of the individual Bayes Factors, i.e. 



BFjnaxH — JjBFsingle, s, (2.29) 

s 

where BFgingie, s is the exact Bayes Factor calculated using data only from subgroup s. This relationship 
also holds for the ABE, i.e. 



ABF^|,h('^) := ABF^H^,^ = 0) = 11 \/ ^2^^ (t 5fT^) " ^^'^^^ 

To connect this with frequentist tests, consider the likelihood ratio test of the global null hypothesis 
bs = (for all s) vs the general alternative where the bg values are unconstrained. The likelihood ratio 
test statistic for this test can be written as the product of likelihood ratio statistics in each subgroup: 

LRmaxH = nLR«, (2-31) 

s 

where LR^ is the likelihood ratio test statistic for 6<j = vs the alternative that bs is unconstrained. For 
reasonably large sample sizes, LR^ can be well approximated by 

lim LR<, w exp f-^) , (2.32) 



2 



and so 



LRmaxH~exp( ) . (2.33) 



Thus the likelihood ratio test is approximately the same as a test based on Y2s '^si which (again assuming 
large sample sizes) is the sum of the squared Z values, and p-values can be obtained by noting that 
under the global null hypothesis this sum will be ~ x^- This test is very similar to Fisher's approach 
to combining test statistics from multiple studies. 

Now, again, we consider the following question: under what prior assumptions will ABF|^^^jj give the 
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same ranking of SNPs as Y^gT^l Under the ES model, we can see that in fact no single (j) value will 
give this result. However, if we relax the ES model assumption by allowing (p to be subgroup specific, 
with 



K5l (2.34) 



where K is a constant for all subgroups and all SNPs tested, then it is easy to see that the resulting 
ABF yields the same ranking of SNP associations as X^s^^- 



NOTE 5. Recalling that 6s is the standard error for bg, the implicit p-value prior (2.34) effectively 
assumes bigger effects in those subgroups with less information. There seems to be no good justification, 
in general, for this prior assumption. 

2.6 Extension to Case-Control Data 



In situations when phenotypes are case/control status, we can replace the linear model (2.1) for each 
subgroup by a logistic regression model: for individual i in subgroup s, the phenotype-genotype associ- 
ation is modeled by 

Furthermore, we use the same form for the priors on /i^, f3s and /3 as described in the EE model for 
quantitative traits. 

Computing Bayes Factors for this model is challenging because the marginal likelihood is analyticly in- 
tractable. To ease the computation, we approximate the subgroup-level log-likelihood function /(/3s, /U^) 



given by (2.35) using an asymptotic expansion around its maximum likelihood estimates. The resulting 



approximate Bayes Factor has the same functional decomposition form as in (2.23) only with t-statistics 



replaced by Wald statistics. We show the details of the derivation and the results in appendix [Bj 
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3 Applications and Illustrations 



3.1 deCODE Recombination Study 



The deCODE recombination study (Kong et al. (2008)) aimed to find genetic variants that explain 



genome-wide recombination rate variation. The study genotyped 1,887 males and 1,702 females from 
the Icelandic population and performed a genome-wide scan searching for association between SNP 
genotypes and the estimated genome-wide average recombination rate for each individual. 

Prior to this study, it was already well known that male and female recombination rates differ consid- 
erably over moderate-sized genomic regions; the researchers therefore analyzed the data separately for 
males and females. They estimated the genetic effect sizes on the recombination phenotype assuming 



an additive model (2.1). For recombination rate in males, they found three highly correlated SNPs in 
a small region on chromosome 4pl6.3 show strong association signals. Interestingly, these three SNPs 
also show strong associations in females, but with the estimated effect being in the opposite direction 
(i.e. the allele associated with lower recombination rate in males appears to be associated with higher 
recombination rate in females). 

Here we use these data as a simple illustration of our Bayesian analysis tools, taking the summary-level 



data on three reported associations in Table 1 of Kong et al. ( 2008 ) to compute ABFs under different 



models of heterogeneity. In particular, we use their point estimates of effect sizes /3maie and /3femaie and 
infer se(/3maie) and se(/3maie) from their corresponding reported p-values. With these summary data we 
are able to compute ABFs under the EE model. We treat males and females as two subgroups and 
consider 4 levels of expected marginal overall effect sizes with \J'\\P' ^vfi = 5, 10, 20,40 (the phenotype 
scale being centi-Morgans) and 5 levels of heterogeneity levels with '\\P' jw^ = 0, 0.5, 1, 2, cxd. In total, we 
obtain a grid of 4 x 5 different (-0, w) combinations and we treat every grid value as a 'priori equally 
likely when computing ABF^^. (Of course it would be easy to consider a denser grid of values, and 
necessary if we wanted precise estimates of and but this coarser grid suffices for our purposes here.) 

The resulting Bayes Factors are shown in Table [TJ Note that the meta-analysis Bayes Factor allowing 
for heterogeneity (ABF^^) is many orders of magnitude larger than either of the subgroup-specific 
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Bayes Factors, which are themselves larger than the Bayes Factor under the fixed effects meta-analysis 
model (ABFg^). This illustrates two simple but important points. First, that a meta-analysis can yield 
considerably stronger signal than subgroup-specific analyses, and second that in this case a standard 
fixed-effects meta-analysis would be ineffective at identifying the association signal. Of course, in 
general, the "right" level of heterogeneity is unknown; ABF^^ deals with this problem by averaging 
over different levels of heterogeneity (including the fixed effects, or no heterogeneity, case). In general we 
view this ability to average over unknown quantities as an attractive feature of the Bayesian approach, 
although as we shall see in the next application it can be helpful to examine the components of this 
average separately. 





Male 




Female 


Meta Bayes Factors 


SNP 


Effect (p-value) 


ABF™ 

single, male 


Effect (p-value) ABF^^ 




ABF^E 


rs3796619 
rsl670533 
rs2045065 


-67.9 (1.1 X 10-1^) 
-66.1 (1.8 X 10-11) 
-66.2 (1.6 X 10-11) 


101112 
108.06 

108.11 


67.6 (7.9 X 10-*^) 102-81 
92.8 (4.1 X 10-*^) lO''-^^ 
92.2 (6.0 X 10-8) 10^-^° 


103.07 

lOi.io 

101.18 


1013.91 
1012.58 
1012.49 



Table 1: Bayesian meta-analysis of genetic associations with recombination rate. The SNPs and their es- 



timated effect sizes and p-values are directly taken from Kong et al. ( 2008) Table 1 . We compute approx- 
imate Bayes Factors under the EE model using only those reported summary statistics. ABF^^j^ ^^^^ 
and ABF^gjg female approximate Bayes Factors computed using only male subgroup and female 
subgroup data respectively. 



Having established that an association exists, one might turn to assessing the extent of the heterogeneity 
among subgroups. We note that, although the p-values of the three SNPs indicate the genetic effects in 
males and females are separately both highly significant, and the effect size estimates have opposite signs, 
the p-values themselves do not directly assess evidence for heterogeneity. In contrast, the Bayes Factors 
in Table [T] can be compared with one another to help directly assess the evidence for heterogeneity. 
Indeed, the fact that ABF^^ is substantially larger than ABFg^ immediately indicates that the data 
are inconsistent with the fixed effects assumption that effects are the same in both subgroups. More 
detailed investigation can be performed by comparing the Bayes Factors computed under different levels 
of the heterogeneity parameter (ip'^ /w'^); in this case the data are consistent with infinite values for this 
parameter (i.e. with the "maximum heterogeneity model", w'^ = 0). 
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On the face of it, the analysis above appears to support the conclusion that one of these SNPs may have 
causal effects of different signs in males and females. However, this may not be the case. In particular, 
a recent association analysis of recombination rates in other samples suggested that this genetic region 
may actually contain more than one genetic variant affecting recombination rates, some acting in males 
and others in females, rather than a single genetic variant with antagonistic effects in the two groups 



(Fledel-Alon et al. (2011)). A more detailed assessment of this would require association analysis of 



multiple SNPs simultaneously, rather than the single SNP analyses we consider here. 



3.2 Global Lipids Study 



The Global Lipids consortium (Teslovich et al. (2010)) conducted a large scale meta-analysis of genome 



wide genetic association studies of blood lipids phenotypes. In this study, more than 100,000 individuals 
of European ancestry were amassed through 46 separate studies (grouped into 25 studies in their final 
analysis). For each individual, measures of total cholesterol (TC), low-density lipoprotein cholesterol 
(LDL-C), high-density lipoprotein cholesterol (HDL-C) and triglycerides (TG) were obtained. Large- 
scale (whole genome) genotyping of genetic variants (SNPs) was performed and missing genotypes were 
imputed: in total, about 2.7 million common SNPs were included in the final association analysis. 
In each individual study, all four phenotypes were independently quantile normal transformed; single 



SNP association tests were performed for all SNPs and all phenotypes using the linear model (2.1) 



and the estimated effect sizes and their standard errors were computed. The meta-analysis combined 



these summary- level data from each individual study using the software METAL (Wilier et al. (2010)), 



using the weighted statistic (2.24) with weights Ws = y/nl-, which, as noted above, can be viewed as 



an approximation to a fixed effects analysis under the ES model if the SNP genotype frequencies are 



similar across studies. Teslovich et al. (2010) reported 168 SNP-phenotype associations exceeding their 
"genome- wide significant" threshold (fixed effects p- value < 5 x 10^®), and identified 95 genes, with 59 
showing genome-wide significant association for the first time. 

Given that this meta-analysis involves 25 different subgroups, with widely differing enrollment criteria 
(e.g. different disease cohorts), one might expect that heterogeneity of effects could be an issue. Indeed, 
many of the reported associated loci had effect size estimates of different signs in different studies. 
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Furthermore, one might worry that the fixed effects analysis used in the original analysis could have 
missed significant associations with moderate heterogeneity. To assess these issues we reanalyzed the 
data, using our Bayesian tools that allow for different degrees of heterogeneity. 

To perform these analyses we were able to obtain access to summary data from each study, in the form 
of an estimated effect size (computed from the quantile-transformed phenotype data) and its standard 
error for each SNP in each study. These summary data allow us to perform analyses under the EE 
model, rather than the ES model effectively used in the original analysis. We present results for ABFs 
computed under three different types of EE model that assume increasing amounts of heterogeneity: the 
fixed effects model, the "limited heterogeneity" CEFN model, and the maximum heterogeneity model. 
In each case we assumed a discrete uniform prior on the overall genetic effect size (i.e., (fc^ + l)w'^ in 
the CEFN model and w'^ + in the other models) on the set {0.1^, 0.2^, 0.4^, 0.6^, 0.8^}. For the fixed 
effects model, ^ = 0; for the maximum heterogeneity model, w = 0; and for the CEFN model we set 
k = 0.326 which gives a prior probability of 1/1,000 that the genetic effect in each study has an opposite 
sign to the expected average effect p. 

Although a fully automated Bayesian analysis would most naturally average over the three different 
models of heterogeneity we considered, in practice, because of their different sensitivity to particular 
data features (see below) we found it helpful to examine them separately in this application. 

As an initial check on data handling we verified that our Bayesian fixed effects analysis produced 
similar results to the original fixed effects analysis. As expected, we found that ABFg^ ranked SNP 
associations very similarly to the original reported (ES model) p-values. However, there were a few 
notable exceptions. In particular, a few SNPs showed much stronger association signal in ABE™ than 
the original analysis. Further investigation suggested that these results likely reflected the EE analysis 
being less robust to certain issues than the original ES analysis (rather than, say, due to the difference 
between the prior distributions we assume in our Bayesian analysis and the implicit prior distributions 
assumed by the original p- value analysis). For example, SNP rsl7061870 with LDL phenoype, had 
ABFg^ > 10^^ {'Tee — 86.46 with a corresponding p-value = 1.4 x lO"^*^) compared with an original 
reported j*- value = 0.028 (7^g = 4.85). But examination of the study-specific data for this SNP showed 
a suspicious pattern: the p- value was 2 x 10~^^ in one study (the Family Heart Study, FHS), but no 

22 



smaller than 0.1 in the 5 other studies for which this SNP had genotype data available. Furthermore, 
the very small p-value in FHS was driven primarily by a very small, probably erroneous, estimate of the 
residual error in that study (the sample size of this particular study is not large) which under the EE 
model results in a very high weight on that study, but in the ES model does not. We emphasize that 
we performed the EE analysis here because it was what we were able to do easily with the available 
summary data, rather than because we prefer it. 

Next we assessed the evidence for heterogeneity of effects in the 168 association signals reported by the 
original analysis. We did this by comparing the support for the limited heterogeneity model (ABF™^^) 
with the support for the no heterogeneity model (ABFg^). The majority of phenotype-SNP pairs 
(111/168) showed stronger support for the no heterogeneity model, although a few showed overwhelm- 
ing support for the limited heterogeneity model (Figures [l| [2]) . In light of the discussion of the recom- 
bination example above, it is important to note that even these apparently overwhelming signals for 
heterogeneity may actually reflect other factors (e.g. multiple SNPs in a gene affecting phenotype, or 
different measurement protocols in different studies) rather than genuine biological interactions with 
study group. 

Of course, the fact that the 168 associations reported in the original analysis generally showed little 
evidence for heterogeneity could be explained by the fact that these associations were identified by a 
fixed-effects analysis that assumed no heterogeneity. Therefore, we finally assessed whether the original 
fixed effects analysis likely missed some associations showing heterogeneity across studies. To do this, we 
performed a genome- wide analysis for each phenotype, in each case excluding all SNPs within 1Mb of any 
SNPs originally reported as being associated with that phenotype, and searching for SNPs that showed 
strong evidence for association under one of the heterogeneity models (ABF^^^ or ABFJ^^^jj > 10^) 
but not under the fixed-effects model (ABFg^ < 10^). This threshold (10^) corresponds very roughly 
to, and is perhaps slightly more conservative than, the threshold effectively used in the initial analysis. 
(We used the same threshold for all three models for simplicity, but in this setting it would probably 
be better to use a variable threshold; for example, it would probably be appropriate to use a more 
stringent threshold for ABFj^^^y than the other models, reflecting the fact that strong heterogeneity is 
uncommon in this context.) 
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Figure 1: Assessment of evidence for heterogeneity in 168 reported phenotype-SNP association signals from 
Teslovich et al. ( |2010 ). Large values on the y axis indicate stronger support for the model allowing for limited 
heterogeneity (ABF^j^) compared with the model with no heterogeneity (ABF|'|'). The three highlighted points 
correspond to associations with overwhelming evidence for the limited heterogeneity model, ABF™jj/ABF|'|' > 
10^°; forest plots for these three associations are shown in Figure [2] 



Overall we found 42 SNP-phenotype associations satisfying this criteria (counting multiple signals due 
to SNPs in LD with one another as a single association) , representing associations potentially missed by 
the original analysis. However, detailed investigation of these suggested to us that many of them were 
likely not to be genuine associations. For example, 36 of these associations showed strong signals in 
ABFJ^^j^jj, but not in the other models; but all these were driven by apparently strong associations in the 
FHS study alone, which themselves seemed likely to be due to data processing errors (e.g. for these SNPs 
the p-values in the FHS for quantile transformed phenotypes were many orders of magnitude smaller 
than for the original phenotypes). We then dropped the FHS data and re-performed this analysis. All 
6 remaining signals from the previous analysis still satisfy the criteria (Table [2]). Of those, the first 
two listed seem almost certain to be genuine: the genes ABCAl and KLF14 are reported in Teslovich] 



et al. (2010) as associated with other lipid phenotypes (ABCAl with HDL and TC; KLF14 with HDL), 



but not with the phenotypes we listed in Table [2j and both reflect associations that just missed being 
significant in the original fixed effects analysis. The next two map (in the human genome build 36.3 that 
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Figure 2: Forest plots for the three highlighted association signals in Figure [TJ which showed overwhelming 
evidence for heterogeneity of (apparent) effects. 



we used) approximately 6Mb apart on chromosome 11; however examination of the raw data showed 
these SNPs to have similar effect estimates across studies, suggesting that they are correlated with one 
another. Usually SNPs 6Mb apart would be uncorrelated with one another, but this could be explained 
by mapping errors, and indeed in more recent genome build the SNPs are closer together (though still 
2.5Mb apart). Further, both SNPs map to within a few Mb of the gene LRP4, which was reported as 
strongly associated with HDL in the original analysis. It seems quite possible that the signals at these 
SNPs are simply reflecting correlation of these SNPs with this reported association and not a novel 
association. Finally, the last two associations are again driven by apparent anomalies in a single study 
(this time B58C-WTCCC). 



In summary, we flnd that the original fixed effects analysis in Teslovich et al. (2010) was highly effective 



at identifying the main association signals in these data. Nonetheless, some of the (presumably real) 
associations identified by the original study do show a substantially stronger signal in analyses that allow 
for heterogeneity (Figure [T]), and it remains possible that in other data sets meta-analyses allowing for 
heterogeneity could cause findings that are "borderline significant" under a fixed effects model to be 
promoted to a level of convincing evidence. On the other hand, our results also provide a cautionary 



25 



Phenotype 


SNP 


Gene Region 


login(BFF) 

olUV tlx / 


loginfBF™ tt) 

oiuv maxrl/ 


login(BF)!?J 

oiu\ cern/ 


LDL 


rsl800978 


5'UTR of ARCAl 


5.2 


3.4 


6.0 


TG 


rsl562398 


Flanking KLF14 


5.3 


-0.2 


6.5 


HDL 


rsll229165 


Flanking OR4A16 


4.6 


4.9 


6.4 


HDL 


rs7108164 


Flanking OR4A42P 


4.2 


4.9 


6.3 


HDL 


rsll984900 


N.A. 


-1.1 


16.6 


6.2 


HDL 


rs6995137 


Flanking SFRPl 


-0.4 


6.9 


4.8 



Table 2: Association signals that show strong association under the models allowing for heterogenetiy 
(ABF^Jj^ or ABFJ^^^jj > 10*^) but less strong under a model with no heterogeneity (ABFf^ < 10^). It 
seems likely that only the first two of these reflect genuine associations missed by the original analysis 
(see text for discussion). 



tale: in the context of meta-analysis of genetic association studies, when associations appear only 
under models allowing for strong heterogeneity, and not under fixed effects models, the reasons for the 
discrepancy must be examined carefully, and the results interpreted critically. Indeed, it seems that 
searching for SNPs showing strong apparent heterogeneity of effects provides an effective way to identify 
data processing errors that may otherwise lurk undetected! 



3.3 Population eQTL Study 



In this section, we apply the proposed Bayesian methods to map expression quantitative trait loci 
(eQTLs) using multi-population data. An eQTL is a genetic variant (here we only focus on SNPs) that 
is associated with gene expression phenotype. 



We consider gene expression measurements from Stranger et al. (2007), obtained using the Illumina Sen 



trix Human-6 Expression BeadChip, on lymphoblastoid cell lines derived from 141 unrelated individuals 



from the International Hapmap project (The International HapMap Consortium (2005)). These indi- 
viduals were sampled from three major population groups (41 Europeans (CEU), 59 Asians (ASN) and 



41 Africans (YRI)) and were fully sequenced in the pilot project of the 1000 Genomes project (Durbin 



et al. (2010)). We focus on the 8,427 distinct autosomal genes that were confirmed to be expressed in 
the same African samples by an independent RNA-seq experiment (Pickrell et al. ( |2010 )). We used the 
SNP genotype data on 14.4 million SNPs from the final release (March, 2010) of the pilot SNP calls 
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from the 1000 genomes project, with no additional ahele frequency filtering applied. In addition to 



the original normalization applied to the expression data in Stranger et al. (2007), we perform quantile 
normal transformations to each selected gene, separately within each population group, to reduce the 
influence of outliers or other deviations from normality. 

Previous studies have shown that most eQTLs are located near to the gene whose expression they in- 
fluence (so-called "cis-eQTLs"). Therefore for each gene we restrict our association analysis to the "cis 
SNPs" which lie within the region 500kb upstream of the transcription start site and 500 kb downstream 
of the transcription end site. Previous analyses of these kinds of data have also either ignored potential 
heterogeneity of effects across populations, or have examined heterogeneity by analyzing each popula- 



tion separately and then looking for differences between the results in each population (e.g. Stranger 



et al. (2007)). However, analyzing populations separately in this way has several disadvantages: for 
example, it has less power to identify eQTLs than a joint analysis (at least, those eQTLs exhibiting little 
heterogeneity) , and if an eQTL is identified in one population but not another then it is unclear whether 
this likely represented genuine heterogeneity or lack of power. Our Bayesian tools help overcome these 
problems: we can analyze all the populations together, allowing for potential heterogeneity, and also 
assess how strong the evidence is for heterogeneity in the eQTLs identified. 

To perform our analyses, we group the samples by their population of origin to form three subgroups and 
apply the proposed Bayesian methods. We use the ES model with a grid of (</>, a;) values. Specifically, 



we consider five levels of \/ (f)^ + uj'^ values: 0.1, 0.2, 0.4, 0.8, 1.6, and seven degrees of heterogeneity 
characterized by cjP' /uj"^ values: 0, 1/4, 1/2, 1, 2, 4, oo. Further, we assign these 35 grid values equal prior 
weight. These prior distributions are broad, covering a wide range of possible effect sizes and levels 
of heterogeneity. For each gene, we compute BF^^ for each cis SNP and identify the highest ranked 
("top") SNP. For each top SNP we assess the evidence for heterogeneity by examining how the Bayes 
Factors vary with the heterogeneity (jP' /u"^. 

We begin by quantifying the fact that mapping eQTLs in all groups simultaneously produces stronger 
association signals than considering the groups separately. Specifically, for the most strongly asso- 
ciated SNP in each gene, we compare the overall Bayes Factor for association averaging over many 
different possible levels of heterogeneity (BFg^^ ) with the largest of the Bayes Factors obtained from 

27 



considering each population separately (BF^^^ ^jj^gj^,). Among 5,691 genes in which top associated 

SNPs are polymorphic in at least 2 populations, 4,470 of them (79%) show stronger signal when 

mapping together (BFj^^ > BF^^|^^ ^j^^gj^) and 2,214 of them (39%) show considerably stronger signal 
(BF^'/BF^s > 10)_ 




Figure 3: Comparison of strength of association signal from mapping eQTLs separately in each population 

(-'^-'^ max single)' ^^"^ simultaneously in all populations, allowing for heterogeneity (EFg^^). A clear majority of 
associations (79%) show stronger association in the joint analysis, often much stronger. 



Examining the posterior distributions of the heterogeneity parameter for each SNP suggested 

that most eQTLs showed little heterogeneity across populations (results not shown). Here we focus 
on the relatively few SNPs that showed evidence for strong heterogeneity; specifically we focus on 
SNPs for which the Bayes Factor for the maximum heterogeneity model {(fP' /uj^ = oo) versus the no- 
heterogeneity model {(jP' /uj"^ = 0) is largest. Examining the value of this ratio across all top SNPs 
(Figure |4]) we identified 10 SNPs with very strong evidence in favor of the maximum heterogeneity 

^ ^ ^ ^ -^g I I 

model (BFjna^H/BFfix > 10^). These are listed in Table 31 and four are illustrated in more detail in 
Figure [5} 
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Figure 4: Comparisons of maximum heterogeneity models and fixed effects models for all top ranked 
SNPs. The 10/8,427 SNPs with strong evidence for the heterogeneity model (BFn,j,^H/BFfix > 10^ and 

ES I I 

BF^y > 10^) are highlighted in green, and hsted in Table pi 
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Gene 


logio(BFfif) 


— -ES 

logio(BF^^^H) 


logio(BFf') 


rs9595893 


RP11-298P3.4 


4.5 


19.3 


19.0 


rsl037495 


SOSl 


7.1 


12.3 


12.1 


rs3180068 


PAQR8 


7.0 


12.0 


12.0 


rs380359 


PLA2G4C 


4.7 


11.6 


11.5 


rs6008545 


TTC38 


0.7 


11.7 


11.3 


rsl313996 


C6orf48 


0.0 


7.3 


7.0 


rsll070253 


BUBIB 


0.3 


7.1 


6.8 


rs4072597 


HEATR2 


0.1 


6.0 


5.7 


rs7581360 


P0LR2D 


-0.3 


5.8 


5.5 


rs437380 


C17orf48 


0.1 


5.7 


5.4 



Table 3: Details of the 10 eQTL SNPs showing strongest evidence for the maximum heterogeneity 
model. 
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Figure 5: Forest plots for four of the 10 eQTL SNPs showing strongest evidence for the maximum 
heterogeneity model. Each panel shows the estimated effect and its 95% confidence interval in each 
population (obtained from a standard linear regression applied separately in each population). 



3.3.1 Investigating Potential Population-specific eQTLs 



Examining Figure [5] suggests that at least some of the eQTLs with evidence for heterogeneity may be 
consistent with a particular type of heterogeneity: having an effect in only some populations, with no 
effect in others. That is they might be "population-specific" . For example, the forest plot for the eQTL 

in BUBIB suggests this eQTL may be specific to the YRI sample. This kind of heterogeneity could 

'^It could be objected that the notion of a "population-specific" eQTL is too simplistic, and that apparent absence of 
effects in some populations more likely reflects very small, non-zero, effects. While sympathetic to this argument, we also 
find the simplicity has a certain appeal, and we view such models as potentially useful nonetheless. 
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occur if, for example, an eQTL affected binding of a particular regulatory element that is active in some 
populations and not others. Such differences in activity could, for example, happen as a consequence 



of natural selection (Kudaravalli et al. (2009)). 



Motivated by this, we here describe how we can use the models described above to explicitly allow for 
"subgroup-specific" effects. Let C denote a binary string of indicators for whether an eQTL is active 
(i.e. has non-zero genetic effect) in each population. For example C = (110) would indicate that the 
eQTL is active only in the first two populations. For the three populations in our data, C has 2^ possible 
values, which we refer to as "configurations". To evaluate the support for a particular configuration, 
we compute a Bayes Factor that contrasts the marginal likelihood of the configuration of interest to the 
nuh model C = (000). For example, for C = (110), 

^ P{yi,y2,y3\9i,92,93^c = (no)) 
P {y„y2,y3\9i,92,93, C = (000)) 
^ P{yi,y2\9i,92) 

P{yi,y2\Ho) 



(The simplification is due to the assumption that the vectors of residual errors in (2.1 ) are independent 
across populations.) Note that this BF depends only on the data from the subgroups where the eQTL 
is assumed active. 

Here, for the populations in which the eQTL is active, we use the ES model and the CEFN prior (with 
k = 0.314), so that we are allowing for only "limited" heterogeneity of effects in populations where the 
eQTL is active (which might be expected a priori, and is also consistent with the examples in Figure 

i- 

To illustrate this approach we apply it to SNP rsl 1070253 in the gene BUBIB. The BFs for all eight 
configurations. Table [4j show strong support for the configuration where this eQTL is active only in the 
Yoruban population. 

It should be evident that a lot more work is required to turn these kinds of observations into solid 
biological insights. Indeed, without additional experimental confirmation we regard it as unclear whether 
any of the eQTLs we highlight here represent genuine interactions. Nonetheless, we believe this example 
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5.9 
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Table 4: Evaluation of population specificity for SNP rsll070253 and gene BUBIB. The log^g Bayes 
Factors for all possible activity configurations (the left column) are shown. 



is helpful both for illustrating the kinds of analysis that arc possible with the tools wc have developed, 
and also for highlighting putative interacting eQTLs that may be interesting for further study. 



4 Discussion 



The main contribution of this paper is to provide a flexible toolbox of methods for Bayesian association 
analyses involving potentially heterogeneous subgroups. We focus particularly on the genetic association 
context, which is characterized by the fact that the first goal is generally to reject the "global null"; 
that is, to identify genetic variants that are associated with phenotype of interest in some subgroup. 
Our examples illustrate how the tools we present can be used to i) identify associations allowing for 
different amounts and types of heterogeneity; and ii) investigate the relative strength of the evidence for 
different amounts and types of heterogeneity. The tools are sufficiently flexible to tackle a wide range of 
applications, including both applications involving limited heterogeneity as one might expect in typical 
meta-analyses, and the more extreme heterogeneity that one might encounter in the context of a Gene- 
Environment interaction, or subgroup-specific effects. We have developed different computationally 
efficient approximations to obtain numerical results, which is critical for handling of large scale genetic 
association data. In addition, we have highlighted connections between Bayes Factors and standard 
frequentist test statistics in this context (e.g. Propositions 1 and 2). 

Many of the models and priors considered here have close connections with those employed in the 
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context of meta-analysis. In particular, they are very similar to the mixed effect meta-analysis models 
in standard frequentist approaches for studying of quantitative phenotypes, where the subgroup-specific 



intercept terms fXg in (2.1) are regarded as fixed effects terms and genetic effect /3s (or bg) are regarded 
as random effect terms. 

Our models are also connected with, but differ in an important way from, those most frequently used 
in studies of gene-environment (GxE) interactions. The typical model used in this context is a linear 
model with marginal effect of subgroups and an gene-subgroup interaction term included, i.e. 

Vi = H + (3eSi + Pggi + /3[g.e]Sigi + ei, ei~N(0,(j^), (4.1) 

where Si is a dummy variable denoting the subgroup membership of individual i, and P[g;e] is the 
coefficient of the subgroup-genotype interaction term and often of interest. This model is quite similar 



to (2.1). By re-arranging and grouping the terms, the linear model can be written as 

yi = + PeSi) + {Pg + Plg.,e]Si)gi + ei, ei~N(0,fT2). (4.2) 

Essentially, each subgroup is described with its own intercept, /i + /3eSj, and its own genetic effect, 
(3e + P[g:e]Si- (Note, if a marginal effect of subgroup is not included, the model is making a much 
stronger assumption on equal intercepts for different subgroups, which can be dangerous in practice 



and may lead to Simpson's paradox (Bravata and Olkin (2001))). Nevertheless, the interaction model 



still makes stronger assumptions by assuming that the error variances across subgroups are the same. 
In comparison, our models allow this quantity to vary across in subgroups. This robust assumption is 



highly desirable in genetic association context, because, most likely, neither model (2.1) nor model (4.1) 
captures all factors affecting the phenotype of interest, and confounding factors almost certainly exist. 
In our meta-analysis model, the effect of the unaccounted confounding factors are "absorbed" by both 
the intercept and error variance terms, whereas the interaction model, lacking fiexible error variance 
terms, does not possess this property. 

One important issue that we have largely ignored here is the question of how to weigh evidence of 
heterogeneity in the data (e.g. large Bayes Factors for high heterogeneity models) against an a priori 
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belief that, in general, strong heterogeneity might be rare. In principle this is straightforward: given a 
prior distribution on different types of heterogeneity, it is trivial to use the Bayes Factors to compute 
posterior distributions. However, there remains an issue of choice of appropriate priors (an issue that 
also arises in a disguised form in frequentist approaches, for example in selecting appropriate p- value 
thresholds when testing for heterogeneity) . Here we have often used (discrete) uniform distributions for 
convenience. In general, one might certainly want to change this, and appropriate priors may be context- 
dependent. For example, in a meta-analysis context, one might put more weight on models allowing for 
limited heterogeneity (the CEFN prior), whereas in an gene-environment interaction context one might 
be willing to allow more heterogeneity. 

An attractive alternative to specifying context-specific priors on different levels of heterogeneity would 
be to attempt to "learn" how common are different levels of heterogeneity from the available data. 
This could be done by incorporating the Bayes Factors we have developed here into a hierarchical 
model, whereby, given appropriate data, the relative frequency of different types of heterogeneity could 
be estimated from the data. This approach might be particularly helpful for gene expression studies, 
where the large number of simultaneously measured phenotypes makes it plausible that the data could 
be quite informative for the relative frequency of different types of heterogeneity, and we see this as an 
important area for future work. 
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A Computing Bayes Factors 

In this section, we show the detailed calculations of various Bayes Factors. 
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A.l Computation in the ES Model 



A particular ES model, describing an alternative hypothesis Ha, is fully specified by setting values for 
{(j), oj) and hyper-parameters (t^i, . . . , vs-, ^i, w-i, . . . ,ls, ms). Under the contrasting null model Hq, we 
set = w = while keeping all other hyper-parameters the same. 

Let (3g = (//s, /3s), Ts = ct7^ and 6 = {P^, . . . , /3^, ri, . . . , r^, 6), the marginal likelihood under model Ha 
can be written as 

p{Y\G,Ha) = J p{Y\G,e,Ha)p{e\Ha)de 

= J (^nys\9s,(3s,rs)^P{Ps\rsAHa)^PiTs\Ha)P{^Ha)^^ (A.l) 
= /(/(n/ P{ys\9s,f^s,rs)P{Ps\rs,b,Ha)df3Apma)dbj]JP{Ts\^^^ 



Let = (1 Qs) denote the design matrix of regression model (2.1) for subgroup s, it follows that 



(-)-"^/2 exp (-^{y, - Xsf3,ny, - Xs(3, ) 

Ts 



' exp (^-liVs - XsbsYiys - Xsbs )^ , 
where y^ = y/fsys ^'^^ = V^Ps — {.yf^sP^s, ^s)- We further denote 



(A.2) 







' vl \ 




and ^s = 











(A.3) 



and write prior distribution P{bs\b, Ha) in following matrix form, 

bs\b,Ha^N(b,^s). 



(A.4) 
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We compute the marginal likelihood by sequentially evaluating the following integrals, 



Fh,,s = J P{ys\Xs,b„Ts)Pibs\b,Ha)dbs 



(A.5) 



• exp I -- ( y,y, - {X',y, + ^;'by{X',Xs + ^jYHKVs + ^sb) + b cD^^b 



Let Jh^ = J {Y\s FHa,s)P{b\Ha)db; this quantity is also analytically computable by straightforward 
algebra. 

To compute Bayes Factor of Ha versus Ho under the ES model, we take limits with respect to hyper- 



parameters {vi, . . . , vs, ^1, . ■ . ,ls, fns) according to (2.13), that is 



BF^S((/.,a;) = lim 



lJHMsPirs)dn---drs 



lJHoUsP(rs)dn...dTs 
J Ku^dTi ■■■drs 
J KuadTi - ■ ■ drs' 



(A.6) 



Let us denote 



RSSo,s = y'sVs - ^sVs, 

RSSi,s = y'^y^ - y'^Xs{X'^Xsy^X'^y^ 

<5? 



1 



9's9s - nsgl ' 
y's9s - nsVsQs 

a'sds - ^sqI 



2\-l ■ 



(A.7) 
(A.8) 
(A.9) 

(A.IO) 

(A.ll) 



where Us and gs are the sample means of pheno types and genotypes in subgroup s. It can be shown 
that. 



s \ s 



r, • RSS 



0,s 



(A.12) 
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and 




■ RSSi = + 



52 + , 



• RSSq , 



(A.13) 



The multidimensional integral J KH^d'^i ■ " " drs generally does not have a simple analytic form (although 
it can be represented as finite sums of complicated hypergeometric functions). Next, we show two differ- 
ent approximations, both based on Laplace's method, to evaluate this integral. The first approximation 



is a direct application of Butler and Wood ( 2002 ) and the second one yields a simple analytic expression. 



Although the integral / KHgdri ■ ■ - drs can be analytically computed as a gamma function, for com- 
puting the Bayes Factor, we also use Laplace's method to numerically evaluate it (which essentially is 
applying Sterling's formula) - we find this recipe yields more accurate result for the final Bayes Factor: 
in particular, when there is only one subgroup {S = 1, where the Bayes Factor can be analytically 



computed as in Servin and Stephens (2007)), we obtain the exact result by applying the first Laplace's 
approximation. 

Laplace's method approximates a multivariate integral in the following way. 



/ hiT)e3^^Ur ^ (27r)^/2|/7^|-i/2/,(^)e^y(t) 
Jd 



(A.14) 



where r is an 5-vector, 



T = aigmaxg{T), (A. 15) 

and I Hj. \ is the absolute value of the determinant of the Hessian matrix of the function g evaluated at 
T. Note that the factorization of the integrand is rather arbitrary, it only requires that function h is 
smooth and positively valued and the smooth function g has a unique maximum lying in the interior of 
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D (for detailed discussion, see Butler (2007)). 



Our first approach to apply Laplace's method sets /i(t) = 1. Except for some trivial situations (e.g. 
S = 1), the maximization oflogKna with respect to r is analytically intractable. In practice, we use the 
Broyden-Fletcher-Goldfarb-Shanno (BFGS2) algorithm, a gradient-based numerical optimization rou- 
tine (implemented in the GNU Scientific Library), to perform numerical maximization. This procedure 
leads to BF'^S((/>,a;). 

Alternatively, we apply Laplace's method by factoring the integrand in such a way that g can be 
analytically maximized. This approach results in a closed-form approximation. More specifically, we 
factor info 



KHa = Kn, ■ ■ ■,Ts)e 



g{Ti,...,Ts) 



(A.16) 



where 



Hn, ■■■,rs) 



JJexp 

s 



n 



• exp 



1 ij'C 



2a2 



52 + 02 
5? 



2 (2 + W2 



'5.2 + 



(RSSo,, 

2^ 



RSS 



(A.17) 



and 



„g(Ti,...,Ts) 



Ts^ ^ • exp I ^ Ts • RSSi„ 

s \ s 



n 



(A.18) 



It is straightforward to show that the unique maximum of g{Ti, . . . , ts) is attained at 



n.o - 2 



RSS 



, s — 1, . . . , S, 



l,s 



(A.19) 



which coincides with the REML estimate of in subgroup-level regression model (2.1). Following the 



notations in section 2.3 and noting the relationship between t and F statistics in simple linear regression. 
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_,2 R,SSo,s — RSS 



RSSi,,/(n, - 2) 



(A.20) 



Applying (A. 14) results in 



BF 



ES/ 



a; 



2 C^ + w^ 



n 



52 /RSSo. 



(52 + 02 \^RSSi, 



2 



exp 



r2 52 

' 2 (52 + (/)2 



(A.21) 
(A.22) 



To further simplify the above expression, we use 



'RSSiJ + ^ _ 



^ =e~ 1 + 
n , -2 V \ ri o 



(A.23) 



and (A.21) simplifies to 



exp 



ES 



2 C^+w^ 



n 



(5? 



(52 + </)2 



exp 



2 5? + 



(A.24) 



Remarks. Note, in case ri, . . . , are known, we can directly compute the exact Bayes Factor using 



BF 



ES/ 



, w) = lim 



J Ho 



(A.25) 



without evaluating the multi-dimensional integrals in (A.6). In this particular case, it is easy to show 



that the exact Bayes Factor has the exact functional form as in (2.23) and (A.24), only with all the f^'s 
replaced by the corresponding true values of r^'s. 



Finally, we give the proof for Proposition 1: 



Proof of Proposition 1. The derivation above serves as a proof. An alternative proof can be obtained 
by noting that the REML estimate of r asymptotically converges to the true value of t with probability 
1. From the remarks above, by applying continuous mapping theorem, we conclude that ABF^^((^, w) 
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converges to BF^^(^,a;) with probability 1. 



□ 



A. 2 Computation in the EE Model 

The procedure for computing Bayes Factor assuming an EE model is essentially the same, we omit 
repeating the details but only show the final results of the Bayes Factor of an EE model Hh, specified 
by {ip,w), versus the null model Hq, 



BF™(V,ti;) 



/ KutdTi ■■■drs 
J KhqCLti ■■■(Its' 



(A.26) 



The expression of Kh^ remains the same as (A.12). We denote 



51 + r,V2 



(A.27) 



It can be shown 



Km 



jj2 _|_ 

"3 1 



n 



5? 



J]r,^"'exp ( - Jj^T, 



exp 



1 



E 



• RSSi s 



5'i 



51 + T.V'^ 



• RSSq , 



2 + it;2 



(A.28) 



We use the similar numerical procedure to obtain BF™('0,u)) as in the ES model. 
To derive ABF^^, we factor Ku into 



Kui. = Kn, . . .,Ts)e 



,g(Ti,...,Ts) 



(A.29) 
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where, 



1 up ( S'^+Tslp'^ ■ ) 

■ exp ■ 



2rj^ + ij 



and 



gs(n,...,rs) = J]r,^-' . exp -- 5^r, • RSSi„ 



Again, function g{Ti, . . . , T5) is maximized at 
, s = l,...,S. 



ris -2 
RSSi, 



We denote 



rs * g'sds-^sgs' 



rp2 _ ^ 

' dV 



^sid's+^') 
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Using the similar procedure as in the ES model, we obtain 



^2 + w 



exp 



EE 



W 



2 i^ + w' 



n 



) 



(A.38) 



Same as we have discussed in Remarks of section A.l , if ri, . . . , are known, the exact Bayes Factor 



of the EE model has the same function form as in (A.38), only with f^'s replaced by corresponding r^'s. 



A. 3 Computation using CEFN Priors 



Using curved exponential family normal prior, the computation of the Bayes Factors is slightly different 
than what we show in previous sections. Here, we use the ES model as a demonstration, the procedure 
for the EE model is very similar. 

To compute the Bayes Factor of a CEFN-ES model defined by parameters [k, u) vs. the null model. 



we can carry out the same and exact calculation up to (A. 5). However, due to the nature of the CEFN 



prior, we can no longer perform analytic calculation to integrate out h. Instead, we exchange the order 
of integrations by first analytically approximating the multi- dimensional integration with respect to 
Ti, . . . ,rs using the second procedure of Laplace's method described in previous sections. As a result, 
we obtain the approximate Bayes Factor as a one-dimensional integral 



ABFg|pN(A:,^) 



2-nuj 



RSSo,s 



ns/2 



■ exp 



RSSi 



n 



<52 + A:62 



<52 + A:62 



52 + A;62" 



(A.39) 



dh. 



We then apply an adaptive Gaussian quadrature method, QAGI (implemented in GNU scientific library), 
to numerically evaluate this integral. Essentially, this method first maps the integrand to the semi-open 
interval [0,1) using the transformation y = [\ — b)/b, then apply the standard adaptive Gaussian 
quadrature routine for the finite interval integration. 
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For the EE model with CEFN prior, the final one-dimensional integral can be shown as 



ABF™ 



• exp 



cefn(^)'^) = 



2-nw 



RSSq 
RSSi 



ns/2 



n 



dl 



dl + A;/32 ' 



dl + kp'^' 



(A.40) 



B Bayes Factor for Binary Regression Models 

In this section, we show the computation of Bayes Factor for case-control data. 

Let us denote (3^ = (/is,/3s). The key component in our computation is to approximate subgroup- level 
log-likelihood function with a quadratic form expanding around its maximum likelihood estimate, 

i.e. 



logP(?/Jff„/3J = /(/3J ^ - -{(3, - PJls0,){P, - Ps). 



(B.l) 



where hiPs 



is the expected Fisher information evaluated at Pg. Although this 



type of approximation generally requires the observed Fisher information in (B.l), the observed and 



expected Fisher information indeed coincide as we use the canonical (logistic) link for binary regression 
model. 

Further, we note 



7? := Var(/3,) = {i^j^ - ip^f,Jf^f,Jf,jJ ' 



(B.2) 



is the estimated asymptotic variance of MLE /3s. 



Given approximate log-likelihood function (B.l) and a model He specified by {il),w), the prior distribu- 
tion for Pg is given by 



(B.3) 
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where 



It follows that 









( vl 


\ 






and = 








) 
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/ 



(B.4) 



j Piy,\g„(3,)Pi(3MHc)df3, 

exp (10,)) ■ • \Is + *;^|-^ • exp - Ws + ^;')-'ls)Ps 



(B.5) 



with ris = ^-\ls + Isf^s- 

Under contrasting null model Hq, the parameter space is restricted to /S^ = 0, for satisfies this 
restriction 



where = /is + /^^Ps- It can be shown that 

= exp • i^^HvA + ^^7^)"^ • exp(- 



(B.6) 



27f 



(B.7) 



Finally, we compute 



ABF*^°('0,f«) = lim 



(B.8) 



where the limit is taken as Vg — >■ oo, Vs. By straightforward algebra, we obtain the following final result 
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BFCC(^,^) « ABFCC(^,^) := ABF,%,(Z2^,C;«;) • [] ABF^ , 7,; V), (B.9) 



where 



(B.12) 



and 



7,' := se(;9,)^ (B.13) 



Z2 = 4, (B.14) 



72 _ 

^2 se(;l)2 = ^ , „^ (B.16) 



-Zcc = ^- (B.17) 



E.(7l + V'2)-^' 

~2 



C Small Sample Size Correction for Approximate Bayes Factors 

The accuracy of ABF^^ rehes on the sample sizes in subgroups: when sample sizes arc small in some 
subgroups, the approximation may become inaccurate. In particular, we consider the behavior of the 
approximate Bayes Factor when the null hypothesis is true. A valid Bayes Factor has the property that 



E(BF|ifo) = 1, (C.l) 
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where the expectation is taken with respect to the data distribution (Y in our settings) under the null 
model. This is because, 

E(BF|i/o) = I ^1^1^ • ^(^l^o) dY = 1. (C.2) 



Unfortunately, when sample sizes are small, (C.l) can be violated (as the expected value is strictly 



greater than 1) and this essentially indicates that the approximation becomes inaccurate. 



To demonstrate a violation of (C.l), we consider the special case of one single subgroup. The approxi- 



mate Bayes Factor assuming the ES model with parameters {(j), uj) is given by 

ABFf^^,^{<P,to) = v/r3Aexp(^r2), (C.3) 

and, 

log(ABFiSgi,(0,a;)) = ^log(l-A) + ^r2, (C.4) 

where A = ^z^^a^^a and takes values from [0, 1]. Under Hq, Tg follows t-distribution with ns — 2 degree 
of freedom and 

E(^'l^o) = ^ > 1. (C.5) 

Tig 4 

Now consider the continuous function 

/(A) = ^log(^) (C.6) 
for A S [0, 1], it can be shown that 



lim/(A) = l (C.7) 
A->-0 

lim/(A) = oo. (C.8) 
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Hence, there must exist values of A G (0, 1), such that 



1 < /(A) < E{T^\Ho). 



(C.9) 



Consequently, by Jensen's inequality, for those A values 



log (E {ABFfi^,jHo)) > E (log (ABFiS^iJ \Ho) > 



(C.IO) 



This shows property (C.l) does not generally hold for approximate Bayes Factors, and when sample 



size Hg is small, the inaccuracy may become severe. 



We now propose a simple correction procedure for small sample sizes, which ensures the resulting 



approximation satisfies property (C.l). Specifically, we modify (2.23) into the following form 



(C.ll) 



where the function denotes a one-to-one quantile transformation from a t-distribution with Ug — 2 
degree of freedom to a standard normal distribution, and the function q is defined as 



J? 

>2 _ "cor 



(C.12) 



where 



^^{6i + <P')-'6sqs{Z 



k2\-l 



(C.13) 



Note, the quantile transformation functions qs and q converge to the identity mappings as — s- cxd 



and the asymptotic property of (2.23) is preserved. The numerical performance of this correction is 
demonstrated in appendix [D| 



To show the corrected version of approximate Bayes Factor satisfying (C.l) , we note that ABF 



depends on data Y only through Tg {6s depends on genotype data but not Y). Further, from Remark 



in appendix A.l, we also notice the approximation becomes an exact Bayes Factor (for which property 
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(C.l) is guaranteed) if estimated error variance terms cj^'s are replaced by their corresponding true 
values. When the true error variances are plugged in, under the Hq, T^'s instead follow standard norm 



distributions. It is therefore sufficient to satisfy property (C.l ) by quantile transforming each individual 



Tg in (2.23) from the t-distribution to standard normal distribution. In essence, the correction can be 
viewed as a general strategy of providing a better point estimate of residual errors, therefore the similar 
strategy also likely improves the accuracy of approximate Bayes Factor when EE model or CEFN model 
is used. 



D Numerical Accuracy of Bayes Factor Evaluations 

In this section, we evaluate the numerical accuracy of various approximation methods for computing 
the Bayes Factors. 



We use the dataset from population eQTL study (Stranger et al. (2007)) discussed in section 3.3 for this 



purpose. For each of the 8,427 genes examined, we select the top associated cis-SNP based on the values 



ES 

of BFg^y and re-calculate the Bayes Factor directly based on (|A.6|) using a general adaptive Gaussian 



quadrature procedure (Note, because of its high computational cost in numerically evaluating multi- 
dimensional integrals, this numerical recipe does not apply in general practice). We treat these results as 
the "truth" and make comparison with BFg^^ and ABF^^ (with and without small sample corrections). 
Moreover, we convert various numerical results of Bayes Factors to log 10 scale and compute Root Mean 
Squared Errors (RMSE) for each approximation. 

The results of the numerical evaluation for the ES model shown in Table [5] and Figure [6} Although 
the sample sizes in each subgroup are quite small in this dataset (41 Europeans, 59 Asians and 41 
Africans) , the numerical results of BF^^ are almost identical to the results obtained from the adaptive 
Gaussian quadrature procedure (RMSE = 1.2 x 10~^ in log 10 scale). As expected, the approximate 
Bayes Factor, ABF^^, has the worst numerical performance, mainly due to the small sample sizes in 
this dataset. Nevertheless, the ranking of the SNPs by ABF^^ is quite consistent with true values (rank 
correlation = 0.99). Figure [g] suggests that under small sample situations, ABF^^ tends to over-evaluate 
the true value and this over-evaluation can become quite severe when the true values are extremely large. 
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On the other hand, the proposed small sample size correction method seems very effective: with this 
simple correction, the resulting A*BF^^ are quite accurate comparing with the true values. 



We also perform a similar experiment for the EE model using the same dataset with five levels of 
V^V^ + w"^ values: 0.1, 0.2, 0.4, 0.8, 1.6, and seven degrees of heterogeneities characterized by ip'^/w'^ 
values: 0, 1/4, 1/2, 1, 2, 4, oo, and we assign these 35 grid values equal prior weight. The results are 
similar with the case in the EE model and shown in Table [H 



logio(BFES) logio(ABFES) logio(A*BFES) 



RMSE 1.2 X 10-^ 



4.95 



0.14 



Table 5: Numerical accuracy of three approximations for evaluating Bayes Factors under the ES model. 



BF^^ is based on the first approximation of Laplace's method discussed in appendix 



computed using (2.23) and A*BF^^ is based on (C.ll) which is corrected for small samp 
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Figure 6: Comparison of approximate Bayes Factors before and after applying small sample size corrections. 
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logio(BFEE) logio(ABF^E) logio(A*BF 



av / 



RMSE 4.1 X 10"^ 



5.03 



0.09 



Table 6: Numerical accuracy of three approximations for evaluating Bayes Factors under the EE model. 
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